torchvision.models |
您所在的位置:网站首页 › pytorch shufflenet › torchvision.models |
torchvision.models¶
The models subpackage contains definitions of models for addressing different tasks, including: image classification, pixelwise semantic segmentation, object detection, instance segmentation, person keypoint detection and video classification. Note Backward compatibility is guaranteed for loading a serialized state_dict to the model created using old PyTorch version. On the contrary, loading entire saved models or serialized ScriptModules (seralized using older versions of PyTorch) may not preserve the historic behaviour. Refer to the following documentation Classification¶The models subpackage contains definitions for the following model architectures for image classification: AlexNet VGG ResNet SqueezeNet DenseNet Inception v3 GoogLeNet ShuffleNet v2 MobileNetV2 MobileNetV3 ResNeXt Wide ResNet MNASNet EfficientNet RegNet You can construct a model with random weights by calling its constructor: import torchvision.models as models resnet18 = models.resnet18() alexnet = models.alexnet() vgg16 = models.vgg16() squeezenet = models.squeezenet1_0() densenet = models.densenet161() inception = models.inception_v3() googlenet = models.googlenet() shufflenet = models.shufflenet_v2_x1_0() mobilenet_v2 = models.mobilenet_v2() mobilenet_v3_large = models.mobilenet_v3_large() mobilenet_v3_small = models.mobilenet_v3_small() resnext50_32x4d = models.resnext50_32x4d() wide_resnet50_2 = models.wide_resnet50_2() mnasnet = models.mnasnet1_0() efficientnet_b0 = models.efficientnet_b0() efficientnet_b1 = models.efficientnet_b1() efficientnet_b2 = models.efficientnet_b2() efficientnet_b3 = models.efficientnet_b3() efficientnet_b4 = models.efficientnet_b4() efficientnet_b5 = models.efficientnet_b5() efficientnet_b6 = models.efficientnet_b6() efficientnet_b7 = models.efficientnet_b7() regnet_y_400mf = models.regnet_y_400mf() regnet_y_800mf = models.regnet_y_800mf() regnet_y_1_6gf = models.regnet_y_1_6gf() regnet_y_3_2gf = models.regnet_y_3_2gf() regnet_y_8gf = models.regnet_y_8gf() regnet_y_16gf = models.regnet_y_16gf() regnet_y_32gf = models.regnet_y_32gf() regnet_x_400mf = models.regnet_x_400mf() regnet_x_800mf = models.regnet_x_800mf() regnet_x_1_6gf = models.regnet_x_1_6gf() regnet_x_3_2gf = models.regnet_x_3_2gf() regnet_x_8gf = models.regnet_x_8gf() regnet_x_16gf = models.regnet_x_16gf() regnet_x_32gf = models.regnet_x_32gf()We provide pre-trained models, using the PyTorch torch.utils.model_zoo. These can be constructed by passing pretrained=True: import torchvision.models as models resnet18 = models.resnet18(pretrained=True) alexnet = models.alexnet(pretrained=True) squeezenet = models.squeezenet1_0(pretrained=True) vgg16 = models.vgg16(pretrained=True) densenet = models.densenet161(pretrained=True) inception = models.inception_v3(pretrained=True) googlenet = models.googlenet(pretrained=True) shufflenet = models.shufflenet_v2_x1_0(pretrained=True) mobilenet_v2 = models.mobilenet_v2(pretrained=True) mobilenet_v3_large = models.mobilenet_v3_large(pretrained=True) mobilenet_v3_small = models.mobilenet_v3_small(pretrained=True) resnext50_32x4d = models.resnext50_32x4d(pretrained=True) wide_resnet50_2 = models.wide_resnet50_2(pretrained=True) mnasnet = models.mnasnet1_0(pretrained=True) efficientnet_b0 = models.efficientnet_b0(pretrained=True) efficientnet_b1 = models.efficientnet_b1(pretrained=True) efficientnet_b2 = models.efficientnet_b2(pretrained=True) efficientnet_b3 = models.efficientnet_b3(pretrained=True) efficientnet_b4 = models.efficientnet_b4(pretrained=True) efficientnet_b5 = models.efficientnet_b5(pretrained=True) efficientnet_b6 = models.efficientnet_b6(pretrained=True) efficientnet_b7 = models.efficientnet_b7(pretrained=True) regnet_y_400mf = models.regnet_y_400mf(pretrained=True) regnet_y_800mf = models.regnet_y_800mf(pretrained=True) regnet_y_1_6gf = models.regnet_y_1_6gf(pretrained=True) regnet_y_3_2gf = models.regnet_y_3_2gf(pretrained=True) regnet_y_8gf = models.regnet_y_8gf(pretrained=True) regnet_y_16gf = models.regnet_y_16gf(pretrained=True) regnet_y_32gf = models.regnet_y_32gf(pretrained=True) regnet_x_400mf = models.regnet_x_400mf(pretrained=True) regnet_x_800mf = models.regnet_x_800mf(pretrained=True) regnet_x_1_6gf = models.regnet_x_1_6gf(pretrained=True) regnet_x_3_2gf = models.regnet_x_3_2gf(pretrained=True) regnet_x_8gf = models.regnet_x_8gf(pretrained=True) regnet_x_16gf = models.regnet_x_16gf(pretrainedTrue) regnet_x_32gf = models.regnet_x_32gf(pretrained=True)Instancing a pre-trained model will download its weights to a cache directory. This directory can be set using the TORCH_MODEL_ZOO environment variable. See torch.utils.model_zoo.load_url() for details. Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use model.train() or model.eval() as appropriate. See train() or eval() for details. All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. You can use the following transform to normalize: normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])An example of such normalization can be found in the imagenet example here The process for obtaining the values of mean and std is roughly equivalent to: import torch from torchvision import datasets, transforms as T transform = T.Compose([T.Resize(256), T.CenterCrop(224), T.ToTensor()]) dataset = datasets.ImageNet(".", split="train", transform=transform) means = [] stds = [] for img in subset(dataset): means.append(torch.mean(img)) stds.append(torch.std(img)) mean = torch.mean(torch.tensor(means)) std = torch.mean(torch.tensor(stds))Unfortunately, the concrete subset that was used is lost. For more information see this discussion or these experiments. The sizes of the EfficientNet models depend on the variant. For the exact input sizes check here ImageNet 1-crop error rates Model Acc@1 Acc@5 AlexNet 56.522 79.066 VGG-11 69.020 88.628 VGG-13 69.928 89.246 VGG-16 71.592 90.382 VGG-19 72.376 90.876 VGG-11 with batch normalization 70.370 89.810 VGG-13 with batch normalization 71.586 90.374 VGG-16 with batch normalization 73.360 91.516 VGG-19 with batch normalization 74.218 91.842 ResNet-18 69.758 89.078 ResNet-34 73.314 91.420 ResNet-50 76.130 92.862 ResNet-101 77.374 93.546 ResNet-152 78.312 94.046 SqueezeNet 1.0 58.092 80.420 SqueezeNet 1.1 58.178 80.624 Densenet-121 74.434 91.972 Densenet-169 75.600 92.806 Densenet-201 76.896 93.370 Densenet-161 77.138 93.560 Inception v3 77.294 93.450 GoogleNet 69.778 89.530 ShuffleNet V2 x1.0 69.362 88.316 ShuffleNet V2 x0.5 60.552 81.746 MobileNet V2 71.878 90.286 MobileNet V3 Large 74.042 91.340 MobileNet V3 Small 67.668 87.402 ResNeXt-50-32x4d 77.618 93.698 ResNeXt-101-32x8d 79.312 94.526 Wide ResNet-50-2 78.468 94.086 Wide ResNet-101-2 78.848 94.284 MNASNet 1.0 73.456 91.510 MNASNet 0.5 67.734 87.490 EfficientNet-B0 77.692 93.532 EfficientNet-B1 78.642 94.186 EfficientNet-B2 80.608 95.310 EfficientNet-B3 82.008 96.054 EfficientNet-B4 83.384 96.594 EfficientNet-B5 83.444 96.628 EfficientNet-B6 84.008 96.916 EfficientNet-B7 84.122 96.908 regnet_x_400mf 72.834 90.950 regnet_x_800mf 75.212 92.348 regnet_x_1_6gf 77.040 93.440 regnet_x_3_2gf 78.364 93.992 regnet_x_8gf 79.344 94.686 regnet_x_16gf 80.058 94.944 regnet_x_32gf 80.622 95.248 regnet_y_400mf 74.046 91.716 regnet_y_800mf 76.420 93.136 regnet_y_1_6gf 77.950 93.966 regnet_y_3_2gf 78.948 94.576 regnet_y_8gf 80.032 95.048 regnet_y_16gf 80.424 95.240 regnet_y_32gf 80.878 95.340 Alexnet¶ torchvision.models.alexnet(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.alexnet.AlexNet[source]¶AlexNet model architecture from the “One weird trick…” paper. The required minimum input size of the model is 63x63. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr VGG¶ torchvision.models.vgg11(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶VGG 11-layer model (configuration “A”) from “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr torchvision.models.vgg11_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶VGG 11-layer model (configuration “A”) with batch normalization “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr torchvision.models.vgg13(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶VGG 13-layer model (configuration “B”) “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr torchvision.models.vgg13_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶VGG 13-layer model (configuration “B”) with batch normalization “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr torchvision.models.vgg16(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶VGG 16-layer model (configuration “D”) “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr torchvision.models.vgg16_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶VGG 16-layer model (configuration “D”) with batch normalization “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr torchvision.models.vgg19(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶VGG 19-layer model (configuration “E”) “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr torchvision.models.vgg19_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶VGG 19-layer model (configuration ‘E’) with batch normalization “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr ResNet¶ torchvision.models.resnet18(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.resnet.ResNet[source]¶ResNet-18 model from “Deep Residual Learning for Image Recognition”. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr Examples using resnet18: Tensor transforms and JIT¶ torchvision.models.resnet34(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.resnet.ResNet[source]¶ResNet-34 model from “Deep Residual Learning for Image Recognition”. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr torchvision.models.resnet50(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.resnet.ResNet[source]¶ResNet-50 model from “Deep Residual Learning for Image Recognition”. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr torchvision.models.resnet101(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.resnet.ResNet[source]¶ResNet-101 model from “Deep Residual Learning for Image Recognition”. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr torchvision.models.resnet152(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.resnet.ResNet[source]¶ResNet-152 model from “Deep Residual Learning for Image Recognition”. Parameterspretrained (bool) – If True, returns a model pre-trained on ImageNet progress (bool) – If True, displays a progress bar of the download to stderr SqueezeNet¶ torchvision.models.squeezenet1_0(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.squeezenet.SqueezeNet[source]¶SqueezeNet model architecture from the “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and >> images = list(image for image in images) >>> targets = [] >>> for i in range(len(images)): >>> d = {} >>> d['boxes'] = boxes[i] >>> d['labels'] = labels[i] >>> targets.append(d) >>> output = model(images, targets) >>> # For inference >>> model.eval() >>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] >>> predictions = model(x) >>> >>> # optionally, if you want to export the model to ONNX: >>> torch.onnx.export(model, x, "faster_rcnn.onnx", opset_version = 11) Parameters pretrained (bool) – If True, returns a model pre-trained on COCO train2017 progress (bool) – If True, displays a progress bar of the download to stderr num_classes (int) – number of output classes of the model (including the background) pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. Examples using fasterrcnn_resnet50_fpn: Repurposing masks into bounding boxes¶ Visualization utilities¶ torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, trainable_backbone_layers=None, **kwargs)[source]¶Constructs a high resolution Faster R-CNN model with a MobileNetV3-Large FPN backbone. It works similarly to Faster R-CNN with ResNet-50 FPN backbone. See fasterrcnn_resnet50_fpn() for more details. Example: >>> model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True) >>> model.eval() >>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] >>> predictions = model(x) Parameterspretrained (bool) – If True, returns a model pre-trained on COCO train2017 progress (bool) – If True, displays a progress bar of the download to stderr num_classes (int) – number of output classes of the model (including the background) pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 6, with 6 meaning all backbone layers are trainable. torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, trainable_backbone_layers=None, **kwargs)[source]¶Constructs a low resolution Faster R-CNN model with a MobileNetV3-Large FPN backbone tunned for mobile use-cases. It works similarly to Faster R-CNN with ResNet-50 FPN backbone. See fasterrcnn_resnet50_fpn() for more details. Example: >>> model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=True) >>> model.eval() >>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] >>> predictions = model(x) Parameterspretrained (bool) – If True, returns a model pre-trained on COCO train2017 progress (bool) – If True, displays a progress bar of the download to stderr num_classes (int) – number of output classes of the model (including the background) pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 6, with 6 meaning all backbone layers are trainable. RetinaNet¶ torchvision.models.detection.retinanet_resnet50_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, trainable_backbone_layers=None, **kwargs)[source]¶Constructs a RetinaNet model with a ResNet-50-FPN backbone. Reference: “Focal Loss for Dense Object Detection”. The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each image, and should be in 0-1 range. Different images can have different sizes. The behavior of the model changes depending if it is in training or evaluation mode. During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing: boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with 0 predictions = model(x) Parameters pretrained (bool) – If True, returns a model pre-trained on COCO train2017 progress (bool) – If True, displays a progress bar of the download to stderr num_classes (int) – number of output classes of the model (including the background) pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. Examples using retinanet_resnet50_fpn: Visualization utilities¶ SSD¶ torchvision.models.detection.ssd300_vgg16(pretrained: bool = False, progress: bool = True, num_classes: int = 91, pretrained_backbone: bool = True, trainable_backbone_layers: Optional[int] = None, **kwargs: Any)[source]¶Constructs an SSD model with input size 300x300 and a VGG16 backbone. Reference: “SSD: Single Shot MultiBox Detector”. The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each image, and should be in 0-1 range. Different images can have different sizes but they will be resized to a fixed size before passing it to the backbone. The behavior of the model changes depending if it is in training or evaluation mode. During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing: boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with 0 predictions = model(x) Parameters pretrained (bool) – If True, returns a model pre-trained on COCO train2017 progress (bool) – If True, displays a progress bar of the download to stderr num_classes (int) – number of output classes of the model (including the background) pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. Examples using ssd300_vgg16: Visualization utilities¶ SSDlite¶ torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained: bool = False, progress: bool = True, num_classes: int = 91, pretrained_backbone: bool = False, trainable_backbone_layers: Optional[int] = None, norm_layer: Optional[Callable[[…], torch.nn.modules.module.Module]] = None, **kwargs: Any)[source]¶Constructs an SSDlite model with input size 320x320 and a MobileNetV3 Large backbone, as described at “Searching for MobileNetV3” and “MobileNetV2: Inverted Residuals and Linear Bottlenecks”. See ssd300_vgg16() for more details. Example >>> model = torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained=True) >>> model.eval() >>> x = [torch.rand(3, 320, 320), torch.rand(3, 500, 400)] >>> predictions = model(x) Parameterspretrained (bool) – If True, returns a model pre-trained on COCO train2017 progress (bool) – If True, displays a progress bar of the download to stderr num_classes (int) – number of output classes of the model (including the background) pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 6, with 6 meaning all backbone layers are trainable. norm_layer (callable, optional) – Module specifying the normalization layer to use. Examples using ssdlite320_mobilenet_v3_large: Visualization utilities¶ Mask R-CNN¶ torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, trainable_backbone_layers=None, **kwargs)[source]¶Constructs a Mask R-CNN model with a ResNet-50-FPN backbone. Reference: “Mask R-CNN”. The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each image, and should be in 0-1 range. Different images can have different sizes. The behavior of the model changes depending if it is in training or evaluation mode. During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing: boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with 0 > predictions = model(x) >>> >>> # optionally, if you want to export the model to ONNX: >>> torch.onnx.export(model, x, "mask_rcnn.onnx", opset_version = 11) Parameters pretrained (bool) – If True, returns a model pre-trained on COCO train2017 progress (bool) – If True, displays a progress bar of the download to stderr num_classes (int) – number of output classes of the model (including the background) pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. Examples using maskrcnn_resnet50_fpn: Visualization utilities¶ Keypoint R-CNN¶ torchvision.models.detection.keypointrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=2, num_keypoints=17, pretrained_backbone=True, trainable_backbone_layers=None, **kwargs)[source]¶Constructs a Keypoint R-CNN model with a ResNet-50-FPN backbone. Reference: “Mask R-CNN”. The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each image, and should be in 0-1 range. Different images can have different sizes. The behavior of the model changes depending if it is in training or evaluation mode. During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing: boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with 0 predictions = model(x) >>> >>> # optionally, if you want to export the model to ONNX: >>> torch.onnx.export(model, x, "keypoint_rcnn.onnx", opset_version = 11) Parameters pretrained (bool) – If True, returns a model pre-trained on COCO train2017 progress (bool) – If True, displays a progress bar of the download to stderr num_classes (int) – number of output classes of the model (including the background) num_keypoints (int) – number of keypoints, default 17 pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. Examples using keypointrcnn_resnet50_fpn: Visualization utilities¶ Video classification¶We provide models for action recognition pre-trained on Kinetics-400. They have all been trained with the scripts provided in references/video_classification. All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB videos of shape (3 x T x H x W), where H and W are expected to be 112, and T is a number of video frames in a clip. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.43216, 0.394666, 0.37645] and std = [0.22803, 0.22145, 0.216989]. Note The normalization parameters are different from the image classification ones, and correspond to the mean and std from Kinetics-400. Note For now, normalization code can be found in references/video_classification/transforms.py, see the Normalize function there. Note that it differs from standard normalization for images because it assumes the video is 4d. Kinetics 1-crop accuracies for clip length 16 (16x112x112) Network Clip acc@1 Clip acc@5 ResNet 3D 18 52.75 75.45 ResNet MC 18 53.90 76.29 ResNet (2+1)D 57.50 78.81 ResNet 3D¶ torchvision.models.video.r3d_18(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.video.resnet.VideoResNet[source]¶Construct 18 layer Resnet3D model as in https://arxiv.org/abs/1711.11248 Parameterspretrained (bool) – If True, returns a model pre-trained on Kinetics-400 progress (bool) – If True, displays a progress bar of the download to stderr ReturnsR3D-18 network Return typenn.Module ResNet Mixed Convolution¶ torchvision.models.video.mc3_18(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.video.resnet.VideoResNet[source]¶Constructor for 18 layer Mixed Convolution network as in https://arxiv.org/abs/1711.11248 Parameterspretrained (bool) – If True, returns a model pre-trained on Kinetics-400 progress (bool) – If True, displays a progress bar of the download to stderr ReturnsMC3 Network definition Return typenn.Module ResNet (2+1)D¶ torchvision.models.video.r2plus1d_18(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.video.resnet.VideoResNet[source]¶Constructor for the 18 layer deep R(2+1)D network as in https://arxiv.org/abs/1711.11248 Parameterspretrained (bool) – If True, returns a model pre-trained on Kinetics-400 progress (bool) – If True, displays a progress bar of the download to stderr ReturnsR(2+1)D-18 network Return typenn.Module |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |